Skip to main content
Version: V11

RAG Content processing in VIDIZMO

VIDIZMO uses Retrieval-Augmented Generation (RAG) to improve how large language models (LLMs) generate responses by incorporating relevant content from your VIDIZMO portal. Before generating a response, the system retrieves applicable content and includes it in the prompt, ensuring that responses are based on your organization’s stored information.

With VIDIZMO RAG content processing, you can:

  • Process and embed content using consistent, standardized APIs

  • Store embeddings in supported vector databases for efficient retrieval

  • Optimize retrieval results through configurable chunking and embedding approaches

  • Track and evaluate embedding and retrieval performance using observability tools

Supported content types

RAG content processing works with the following file types:

  • Video files
    Video content that includes metadata and timed data, such as transcriptions and captions.

  • Audio files
    Audio content that includes transcriptions and associated metadata.

  • Image files
    Image content with descriptive metadata used for indexing and retrieval.

  • Document files
    Text-based documents, such as PDF and Word files, with extractable text content.

How does Retrieval-Augmented Generation work in VIDIZMO?

When you upload content to VIDIZMO, the platform automatically prepares it for AI-powered search and retrieval. This preparation happens in the background and requires no manual intervention after initial configuration.

Content ingestion and preparation

VIDIZMO processes your uploaded content through its standard media pipeline. During upload, the system performs encoding, transcoding, and generates thumbnails and posters for visual navigation. Alongside these operations, the platform extracts information that will later power your RAG-enabled chatbot.

For documents, VIDIZMO extracts the full text content and divides it into logical chunks based on pages and paragraphs. This chunking strategy ensures that when users ask questions, the system can retrieve precisely relevant sections rather than entire documents.

For video and audio files, VIDIZMO retrieves timed data—transcriptions, captions, and other time-synchronized text. This allows the chatbot to reference specific moments in your media when answering questions.

For all content types, VIDIZMO collects and consolidates metadata including titles, descriptions, summaries, tags, and any custom attributes you've defined. This metadata is embedded alongside the content to enrich the semantic understanding and improve retrieval accuracy.

Embedding generation and storage

After extraction, VIDIZMO converts all text and metadata into vector embeddings through graph-based processing using a configured embedding model. Vector embeddings are numerical representations that capture the semantic meaning of text. Content with similar meaning is placed close together in vector space, which allows the system to find related information even when exact keywords don't match.

The agentic service generates vectors for three categories of information:

  • Metadata from titles, descriptions, summaries, tags, and custom attributes
  • Timed data from video and audio transcriptions
  • Document text from chunked pages and paragraphs

These embeddings are stored in a vector database alongside references to their source content. The vector database enables similarity search finding content based on meaning rather than exact keyword matches.

Query processing and response generation

When a user submits a prompt through the chatbot, the LLM analyzes the request and determines how to respond. The system supports four response strategies:

Content-based responses occur when the user's question relates to information stored in your VIDIZMO library. The system converts the user's query into an embedding using the same model that processed your content. It then performs a similarity search against the vector database to find content with the closest semantic match. The system retrieves the original text from the matching chunks and passes it to the LLM as context. The LLM generates a response grounded in your actual content rather than its general training data. If the answer references a specific video, document, or audio file, the chatbot includes a direct link so users can access the source material.

Generic responses handle questions that don't require your organization's specific content. The LLM answers using its general knowledge without consulting the vector database.

Web search responses address questions that require current information from the internet. The system performs a web search, incorporates the results, and generates a response based on that external data.

Tool-based responses execute specific actions when the user's prompt requires system functionality beyond text generation. The system invokes the appropriate tool with parameters extracted from the user's request.

Required components

To use RAG content processing, your VIDIZMO environment requires the following components:

Component Purpose
VIDIZMO Indexer Extracts text, metadata, and timed data from uploaded content for further processing
Agentic service Processes content through graph-based execution and converts extracted text, summaries, tags, and custom attributes into vector representations using a configured embedding model
Vector database Stores embeddings and enables similarity search to retrieve relevant content efficiently
LLM service Processes user prompts and generates contextual responses using the retrieved embeddings

Example scenario

A compliance team uploads training videos and policy documents to their VIDIZMO portal. The platform automatically extracts transcriptions from the videos and chunks the policy documents into searchable sections. All content is converted to embeddings and stored in the vector database.

When an employee asks the chatbot "What is our policy on client data retention?", the system:

  1. Converts the question into a vector embedding
  2. Searches the vector database for semantically similar content
  3. Retrieves the relevant text chunks from the data retention policy
  4. Passes these chunks to the LLM as context
  5. Generates a response that summarizes the key requirements with a link to the source document

This approach ensures employees receive accurate answers based on approved organizational content rather than generic information.